A generalized algorithm for estimating the parameters of a 



multivariate, coupled diffusion system 



Melvin M. Varughese 

Department of Statistical Science 
Building 28 (P.D. Hahn), Room 3.21 
University of Cape Town 
7701 Rondebosch 
South Africa 
melvin. varughese@uct.ac.za 



Abstract 

Diffusion processes are used to model a wide range of real-world phenomena. 
Consequently, the ability to estimate the diffusion parameters from available time series 
data would greatly further our understanding of such phenomena. A new algorithm is 
proposed to estimate the parameters for a Fokker-Planck system. The algorithm is 
applicable for a wide class of multivariate diffusion systems. Not only does the method 
provide reliable parameter estimates in a highly computationally efficient manner, it can 
also produce credibility intervals for these parameters. It is demonstrated that the 
credibility intervals have coverage close to their advertised values. Hence, the algorithm 
produces parameter estimates with limited systematic biases. 

Keywords - Diffusion processes, Parameter estimation, Fokker-Planck equations, Cumulant 
truncation procedure, Saddle point approximation, Markov Chain Monte Carlo 



1) Introduction 

Diffusion processes are continuous-time, continuous-space processes that have proven to be natural 
modelling frameworks for many real world phenomena. Indeed, diffusion processes have found 
application in fields that are startlingly diverse. These include finance, ecology, physics, and 
meteorology amongst others. Central to any diffusion process are its drift and diffusion coefficients. 
These represent the diffusion process' instantaneous mean and variance respectively. For a scalar 
diffusion process, (j) , the instantaneous mean, ft and instantaneous variance, a are given by: 
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The behaviour of a univariate diffusion process is dictated by its drift and diffusion coefficients: 

where (t) is the probability density function for the diffusion process^ 
Note that we allow the drift and the diffusion coefficients to be functions of both time and the 

current value (j) of the diffusion process. Together, these two coefficients characterize the diffusion 

process. Let p{(f){t + At)\(/)(t)) be the conditional probability distribution of the diffusion process at 

time t + At given the value of the diffusion process at time t . For At sufficiently small, we have: 

p((f>{t + At)\0(t)) ~ Normal((f)(t) + /3{(f>{t),t)At, a{(f>{t),t)At) ( 2 ) 
Since the transitional probability at time t + At only depends on the current value of the process 

(j>(t), a diffusion process is Markovian. 

The univariate model is easily generalized to the multivariate case. Consider a system of p 

interacting diffusion processes O r = ■■■,0 P )- As before, we assume each of the p diffusion 

processes is characterized by its drift and diffusion coefficients. In the multivariate case, not only 
does the j'-th process's drift and diffusion depend both on time and the current value of the j'-th 
process ^, , it depends in addition on the current state of the entire system, O . The resulting 
equation is: 

dt - dfc 2 5 dft 

where /?, and a i represent the instantaneous mean and variance of <j) i 

p^ (t) is the multivaria te probability distribution of the diffusion systemO ^) 



Equation (3) represents a special case of the Fokker-Planck equations. (In general, the instantaneous 
covariances between the p diffusion processes are non-zero and hence must also be specified.) We 
avoid estimating the parameters for the more general Fokker-Planck equations, as the parameters of 
the resulting Fokker-Planck equations will often be degenerate thus making parameter estimation 
less reliable. 

The primary objective of this paper is to provide a general, statistical method for estimating the 

parameter vector, 6 of a diffusion system described by equation (3). The parameter vector affects 
both the drift and the diffusion coefficients. Strictly speaking, these should be written out as 

pfe,t\0) and afa^o) respectively. However, unless needed, the dependence of the 

coefficients on the parameter vector 6 is suppressed within the notation. 

Maximum likelihood estimation of the Fokker-Planck equations is difficult: except for a few 
special cases, equation (3) is not analytically tractable. Without the probability density function, 
maximum likelihood estimates of the diffusion parameters cannot be obtained. In addition, many 
theoretical results - such as the time it takes for the system to hit a barrier - often assume that the 
probability distribution is known. Consequently, much work has focused on deriving the 
approximate evolution of the probability density function over time. The resulting probability 
distributions can be used to derive maximum likelihood estimates. A simple approach to estimating 
the probability distribution function would be to use Monte Carlo estimates (see Kleinhans and 
Friedrich, 2007). However, in order to reliably estimate the probability distribution, a large number 
of Monte Carlo runs are needed for each set of parameter values lying on a grid. This is 
computationally burdensome - especially if there are a large number of parameters to be estimated. 
Roberts and Stramer (2001) instead proposed imputing diffusion paths between neighbouring data 
points and using the resulting trajectories, together with equation (2), to estimate the parameters. 

It is well known that analytical expressions for the evolution of the probability distribution over 
time exist for special cases of the diffusion process. Haken (1975) demonstrated that an analytical 
expression for the diffusion processes' stationary distribution can be found for a larger class of 
diffusion processes. However, in general, analytical expressions are not possible and hence 
numerical methods must be used. Numerical methods for solving equation (1) were classified by Wei 
(2000) into global and local methods. Under global methods, functions (together with their 
derivatives) related to the distribution function are expanded in terms of a finite basis set. The 
expansion coefficients are consequently solved for. Examples of such methods include the weighted 
residual method (see Di Paola and Sofi, 2002). In contrast, local methods approximate the 
continuous diffusion process by a discrete set of infinitesimally spaced lattice points that vary in 



space and/or time. The finite element method creates a lattice over the spatial domain (see 
Wojtkiewicz and Bergman, 2000). Proposed interpolation schemes between the resulting grid points 
include Hermite polynomial expansions (see Spencer and Bergman, 1993). Banks et al. (1993) 
suggested using a finite-element scheme to calculate the likelihood of the diffusion process and 
consequently maximizing this to get the resulting parameter estimates. The finite difference method 
discretizes the time domain, taking advantage of the fact that over an infinitesimally small time 
period, the diffusion process is normally distributed (see Wehner and Wolfer, 1987). Hum et al. 
(2007) used such a scheme to derive the likelihood and the maximum likelihood estimates. Some 
schemes - such as the path-integral solutions - discretize both in time and in space (see Wehner and 
Wolfer, 1987). Though such grid-based schemes often work well for a diffusion system of low 
dimension, they often prove to be computationally intractable for p > 4 (see Wojtkiewicz and 
Bergman, 2000). 

A wide range of alternatives to maximum likelihood parameter estimation have been proposed. 
Ragwitz and Kantz (2001) suggested that, with appropriate corrections, the instantaneous means 
and variances can be estimated by the corresponding sample moments for the differenced data. 
However, in making the corrections, Ragwitz and Kantz (2001) assumed that the finite differences 
are normally distributed (see Friedrich et al. (2002) as well as Ragwitz and Kantz (2002)). For sparse 
data, this approximation can be highly inaccurate. Kleinhans et al. (2005) suggested that the 
diffusion and drift could be estimated by minimizing the Kullback-Leibler information. However, the 
method assumes that the stationary distribution of the diffusion process is known. For many 
systems, a stationary solution does not exist. Even for systems that have a stationary distribution, a 
large amount of data would be needed to reliably estimate the stationary distribution. This is not 
always available. Gallant (1997) proposed parameter estimates that minimized a modified chi- 
squared based on a Taylor-series expansion of the score function. It was demonstrated that the 
efficiency of the resulting estimates approaches the maximum likelihood estimates as the data size 
increases. 

Beskos et al. (2006) proposed a more computationally efficient method of simulating diffusion 
paths. The key advantage of this procedure - which they called the "Exact Algorithm" - is that 
diffusion paths can be simulated for different parameter values simultaneously. These can then be 
used to estimate the diffusion's probability density; which can in turn be used to calculate the 
likelihood. However, the procedure can only be applied to processes that are transformable into 
unit-diffusion processes. Only special cases of equation (3) satisfy this criterion. In this paper, an 
alternative computationally efficient parameter estimation scheme without this restriction is 
proposed. (It should be noted however, the proposed method requires the drift and diffusion 



coefficients to be expressible as finite-order polynomials - a restriction that Exact Algorithm does 
not require.) 

Central to the proposed method is the cumulant truncation procedure (see Whittle, 1957). If the 
diffusion coefficients are finite-order polynomials in terms of O (though not necessarily in terms of 
t), the cumulants of the diffusion system can be derived using the cumulant truncation procedure 
(see Varughese, in press). This is crucial as instead of having to solve a partial differential equation in 
order to get the probability distribution and then the likelihood, the cumulant truncation procedure 
allows us to calculate the system's cumulants by solving a system of ordinary differential equations. 
This proves to be far less computationally burdensome. We propose approximating a diffusion 
system's probability distribution by a saddle-point distribution, substituting for the cumulants 
obtained from the cumulant truncation procedure. An MCMC procedure can then be used with the 
resulting likelihoods to obtain parameter estimates. 

The advantages of the proposed method are numerous. Firstly, unlike some of the 
aforementioned methods, the method is applicable to multivariate diffusion systems of arbitrary 
dimension, p . Furthermore, unlike grid-based methods, the computational time does not scale 
drastically with increasing dimension. The accuracy of the likelihoods is also controllable within the 
cumulant truncation procedure: if higher accuracy is needed, truncation should be performed at 
higher orders. The use of an MCMC procedure also means that Bayesian credibility intervals for each 
of the diffusion parameters can be obtained in addition to the parameter estimates. Indeed, due to 
the superior computational efficiency of the proposed method, a study of the empirical coverage of 
the resulting credibility intervals is feasible. This is, to the best of the author's knowledge, the first 
time such a study has been undertaken for proposed confidence intervals of Fokker-Planck 
estimates. 

2) The likelihood function 

For a system of p processes observed at equally spaced times t = 1, 2, ...,n we represent the data 

by the matrix X T = (X 1 ,X 2 , ...,X n ) (where X i =(X n , X j2 ,...,X jp ) is the observed process vector 

at time i). The likelihood is given by the joint density of the n random vectors expressed as a 

function of the model parameters 6 . Since diffusion processes are Markovian, we can represent the 
likelihood by: 




i=2 




f\X t X t _{, lis the density of the system at time i conditional on its state at time i - 1 



(4) 



The probability densities in (4) are functions of the proposed parameter vector . As the length of 
the time series increases, the influence of the stationary distribution on the likelihood becomes less. 
Hence, for large time series, the stationary distribution is not required. Since, in general, equation (3) 
is analytically intractable; the exact form of the probability densities in equation (4) is unknown. 
However, these probability densities can be approximated by a saddle-point distribution. 

A saddle-point approximation is a probability distribution whose parameters are the 
distribution's corresponding cumulants up to a chosen order. Hence, if the cumulants of a system 
are known, a saddle-point distribution can be used to approximate the system's (unknown) 
probability densities. Since the parameters of a multivariate normal distribution are its mean vector 
and covariance matrix, the multivariate normal can be thought of as a saddle-point approximation of 
second order. Barndorff-Nielsen and Kluppelberg (1999) derived a multivariate saddle-point 
approximation of arbitrary cumulant order. All other things being equal, the accuracy of the saddle- 
point approximation will increase as the number of cumulants used as saddle-point parameters 
increases (see Varughese, in press). The accuracy should also increase the more closely spaced the 
data is. This is due to the probability distribution of the system governed by equation (3) becoming 
more Normal - and hence better approximated by a saddle-point - as the time interval becomes 
smaller (see Ragwitz and Kantz, 2001). 

By substituting the saddle-point approximation within equation (4), the approximate likelihood 

for a given parameter vector can be derived. This enables the MCMC procedure to be used to 
explore the system's parameter space and hence to derive credibility intervals for each of the system 
parameters. All this however is dependent on the ability to efficiently predict, for a system governed 
by equation (3), the values of the required cumulants at an arbitrary time and for an arbitrary 

parameter vector 6 . This is indeed possible with the cumulant truncation procedure (see Isham 
(1991) and Whittle (1957)). 

3) The Cumulant Truncation Procedure 

The key advantage of the cumulant truncation procedure is that, instead of requiring the solution of 
the partial differential equation in (3), it requires the solution of a system of ordinary differential 
equations. This proves to be far more computationally tractable. 

Any system described by equation (3) can also be characterized by the evolution of its moment 
generating function (MGF), M. Let A r =(u 1 ,u 2 ,... J u ) be the MGF parameters for the p 

processes. In addition, let A* =(d/dv l ,d/dv 2 ,...j3/dv p ) T . Like the differential equation for the 



probability density (given in (3)), there is a corresponding differential equation for the moment 
generating function, M(A,t) = E^e" 1 * 1 *" 2 * 2 *"* **" J ( see varughese and Fatti, 2008): 
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dv x dv 2 



do, dv i 

Though, in principle, one can derive a system of ordinary differential equations directly from the 
partial differential equation in (5), the cumulant truncation procedure is instead based on the 
matching partial differential equation for the cumulant generating function, K(A,t). This is because 
K(A,t) can be represented by the series expansion: 
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The first terms in the expansion are easy to interpret: each corresponds to a centred moment of the 
system. This is the reason, that the truncation procedure is based on K(A,t) instead of M(A,t). 
Since K(A,t) = \nM(A,t), we have: 

— = whereM and K depend on the variable x 

(7) 



M dx dx 

By repeatedly applying the relation in (7), it is possible to derive the corresponding partial 
differential equation for the cumulant generating function from the partial differential equation for 
the MGF (equation (5)). The general form of this equation cannot be shown, but rather must be 
independently derived for each system of interest. This is shown later. 

By substituting the series expansion in (6) within the partial differential equation for K(A,t) and 



then matching the cumulant coefficients, 
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: m i = 0,1,2,...;! < i < p k it is possible 



to derive a system of ordinary differential equations for the centred moments or cumulants. 
Consequently, by solving this system of differential equations, it is possible to predict the 

approximate evolution of the cumulants for each proposed set of parameters 6 . This is key, as we 
can use the predicted cumulants within the saddle-point approximations to derive the approximate 



value of the likelihood at . The derivation of the approximate likelihood for different sets of 
parameters 6 enables the system parameters to be estimated. 

4) The Parameter Estimation Algorithm 

The system parameters can be estimated by the following algorithm: 

1) Apply the cumulant truncation procedure to the Fokker-Planck model thus deriving a system 
of ordinary differential equations that describe the cumulants' evolution. (Note these 

differential equations will depend on the model parameters .) 

2) Choose an arbitrary, starting set of parameter values O . Set old = O . 

3) Propose a jump from the old set of parameter values old to a new set of parameter values, 

6 new =0 oId + A0 where A0 is drawn from a proposal distribution. 

4) Calculate the likelihoods L,(& old ^ and L(0 new }. ^(#) is calculated as follows: 

i. Set the likelihood, L{0) = \ 

ii. Set i to 1 

iii. The initial process vector, O =(^,^ 2 ,...^ ) is set to the data values observed at 
time i , X i . 

iv. Use the system of ordinary differential equations from Step 1 together with the 
current initial process vector cD to predict the values of the cumulants at time j' + l. 

v. Given the data at time i , X t , the multivariate distribution of <J> at time i + l can 

be approximated by a saddle point approximation J]i^>) after substituting for the 
cumulants derived from step iv. 

vi. Set the likelihood, L(0) = L(0)xtj(x m ). This is an implementation of equation (4). 

(Note r/(x i+1 ) is the conditional probability of observing the data vector X i 

conditional on the observed process vector at time i , X t .) 

vii. Set i-i + l and, if i < n go to step iii. 

5) Accept the proposed parameter values, new with probability p where 

fi a" L(eJ\>L(e Mi 



old , 

That is, set old = new with probability p . Otherwise leave old unchanged. 
6) Go to step 3. 



The algorithm is a modified version of the basic MCMC procedure: the saddle-point approximation 
resulting after the implementation of the cumulant truncation procedure is used to calculate the 
probability of accepting the proposed MCMC steps. Therefore, after a suitable burn-in period, the 
MCMC steps sample from the posterior distribution of the parameters. The mechanics of the 
parameter estimation algorithm will be illustrated by way of an example. 

5) Example 

Consider a diffusion system 2 ) with the following instantaneous means and variances: 

fi A {<f>i,<f>2)=a<f> l <f> 2 -b<f> 2 ; a^ 2 ) = c A 

The diffusion equations can be written in stochastic differential form: 

d(j) x = [a<f) x <j) 2 -b<fi 2 }dt + c(f) 2 dB t 
d((> 2 = g{(f -<p 2 pt + <j 2 dB t 

The process (j) y is driven by fluctuations in the process </> 2 , which behaves according to an Ornstein- 
Uhlenbeck process. By substituting the diffusion coefficients within equation (3), we obtain: 
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It follows, from equation (5), that the MGF, M{o l ,o 2 ,t) is described by: 
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Using equation (7), one can for example deduce that: 
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Thus by dividing both sides of equation (9) by M and then repeatedly applying equation (7), one 
can derive the corresponding differential equation for the cumulant generating function: 
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(10) 



The cumulant generating function can be approximated by a truncation of the series expansion 
(shown in (6)). A third order cumulant truncation gives: 



K{o x , 2 ,t) = 1 + V X K W {t) + O 2 K 0l (t) + 0,0 2 K X , {t) + ^j- IC 20 (t) + ^j- fC Q2 (t) 



(11) 



By substituting the series expansion in (11) within equation (10) and subsequently equating the 
coefficients \p l 'v 2 J = 0,1,2,...), we obtain the following system of ordinary differential 
equations: 

k w (t) = afa , (0 + k w (t)K m (0) - b(rc w (t) 2 + k 2Q (0) 

k x , (0 = a{K u (t)K m (t) + k 10 (t)tc m (0) -2bK n (0«i (0 " i (0 
k- 20 (0 = 2a(nc I , (0«i (0 + k- 20 OKi (0) - 4 ^io (0^20 (0 + c^oi (0 
k 02 (t) = a 2 -2gK 02 (t) 

If we were to truncate equation (10) at a higher order, we would in addition be able to predict the 
skewness of the system. The solutions of these equations predict the cumulant values over time. 
Since we are predicting just the means and covariances, the saddle-point approximation is bivariate 
normal. This can consequently be used to estimate the likelihoods for different parameter values. 

The diffusion system has 6 parameters: a, b, c, g, (j) and a . Suppose the values of the 
parameters are as follows: 

a = 0.1, b = 0.02, c = 1.8, g = 0.5, f = 5, a = 1 
Under a typical scenario, the true parameter values will be unknown. These values will need to be 
estimated from a finite set of data spaced at discrete points in time. We test the algorithm by 
applying it to a simulated set of data and comparing the parameter estimates to the true values. If 
the estimates are close to the true values, it suggests that the procedure works well. 

Suppose the data for the proposed diffusion system is only observed at integer points in time. 
The diagram below shows a simulation of the diffusion system: 
value 
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Figure 1: A simulated set of data for the diffusion system 



time 



The dotted line is a simulation of (j) x and the solid line is a simulation of (j> 2 . Using the data, together 
with the previously derived system of ordinary differential equations, we implement the estimation 
algorithm thus obtaining estimates for the six parameters. Since we are running a modified MCMC 
procedure, we can also estimate the posterior distributions of the six parameters. Hence, since the 
true parameter values are known, not only does the procedure provide parameter estimates, it also 
enables p-values to be calculated for each of the parameters. 

Two MCMC chains of length 55,000 were run. The starting points for the chains were: 

a -0.104, b = 0.017, c = 1.525, g = 0.651, 0*=6.64, a -1.001 

a = 0.5, b - 0.1, c - 1, g - 0.7, <f = 7, a = 2 
By starting the chains from very different points in the parameter space, the sensitivity of the 
parameter estimates to the starting values is explored. For the second set of starting parameters, 
values were chosen that were very far from the true values. Reassuringly, the inferences from both 
chains were identical after a burn-in period of length 5,000 was trimmed. The results for both chains 
are summarised in the table below: 



Parameter 


True value 


Estimated value 


Standard error 


p-value 




0.1 


0.12 


0.03 


0.340 


b 


0.02 


0.023 


0.0045 


0.349 




g 


0.5 


0.737 


0.231 


0.227 


(j)* 5 5 01 17 923 


cr 


1 


1.09 


0.128 


0.515 



Table 1: Summary of results from estimation algorithm 



The estimates in the above table were obtained from the simulated data in Figure 1. A quick 
comparison between the true values and the estimated values from the algorithm reveals that many 
of the parameter estimates are close to the true values. Note also that none of the p-values for the 
statistics are significant. That is, none of the estimates are significantly different from the true 
values. 



The diagram below shows a plot of the MCMC sampling distribution for the six parameters. 
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Figure 2: The MCMC sampling distribution for the system parameters 

The above plots may be thought of as estimates of the posterior distributions of the system 
parameters. By taking the appropriate percentiles from the sampled data, one can construct 
credibility intervals for the parameters. 

Thus far, the virtues of the estimation algorithm have only been demonstrated for a single set of 
data. Attention is now turned to the coverage of the credibility intervals. This is, to the best of the 
author's knowledge, the first such study of the empirical coverage of proposed confidence intervals 
for Fokker-Planck parameters. In itself, this serves to highlight the computational efficiency of the 
estimation algorithm as such studies are not feasible with alternative methods of parameter 
estimation. 



The coverage of the confidence intervals provides insight into the statistical properties of the 
estimation algorithm. Empirical coverage close to the claimed level provides compelling evidence 
that the saddle-point approximation does not introduce systematic biases within the resulting 
estimates. We define 90% credibility intervals by the 5 th and 95 th percentiles of the sampled MCMC 
chains. 100 simulations of the diffusion system are run. For each simulation, the corresponding 
credibility interval is calculated. The proportions of the intervals that contain their matching true 
value are: 

a : 80/100 ; b : 82/100 ; c : 83/100 ; g : 87/100 ; f : 82/100 ; a : 81/100 
These proportions seem to indicate that the credibility intervals suffer from under-coverage. 
However, the proportions are close enough to 0.90 for this to be simply due to randomness. (Note 
the observed coverage's for each of the six parameters are correlated.) Certainly, some (perhaps all) 
of the under-coverage could be as a result of the sampling variability associated with the 5 th and 95 th 
sample percentiles. For each simulation, only 5000 parameter values were sampled by the modified 
MCMC algorithm. Since the MCMC steps are correlated, the sample percentiles could have 
substantial variability. Ideally, the credibility intervals should be made wider to account for this extra 
variability. 

6) Conclusions 

This paper proposes an innovative, computationally efficient means to estimate the parameters for a 
diffusion process. This method is far less computationally demanding than virtually all the methods 
that have been proposed to date. Though the method proposed by Beskos et al. (2006) is efficient, 
the method does not seem to be readily generalized to multi-dimensional diffusion systems. 
Furthermore, the method requires the diffusion process to be transformable to a unit diffusion 
process. This condition won't be met by many diffusion processes. (Our approach requires the drift 
and diffusion processes to be finite-order polynomials; which is also a restriction.) Also, since our 
proposed method uses MCMC, we can obtain credibility intervals for each of our parameter 
estimates. It was demonstrated that the estimation algorithm provides precise estimates of the true 
parameter values independently of the starting values of the MCMC chain. The empirical coverage of 
the estimation algorithm's credibility intervals was also studied. Though some under-coverage was 
detected, this was within the sampling error. In addition, the detected under-coverage could 
possibly be explained by the randomness of the sample percentiles. The method has thus been 
demonstrated to be computationally efficient; applicable to a wide class of diffusion systems as well 
as provide reliable estimates of the parameters and their credibility intervals. 
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